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Abstract — An  interactive  algorithm  for  soft  segmentation 
of  natural  images  is  presented  in  this  paper.  The  user  first 
roughly  scribbles  different  regions  of  interest ,  and  from  them 
the  whole  image  is  automatically  segmented.  This  soft  segmen¬ 
tation  is  obtained  via  fast ,  linear  complexity,  computation  of 
weighted  distances  to  the  user-provided  scribbles.  The  adaptive 
weights  are  obtained  from  a  series  of  Gabor  filters,  and  are 
automatically  computed  according  to  the  ability  of  each  single 
filter  to  discriminate  between  the  selected  regions  of  interest. 
We  present  the  underlying  framework  and  examples  showing 
the  capability  of  the  algorithm  to  segment  diverse  images. 


I.  Introduction 

Image  segmentation  consists  in  separating  an  image 
into  different  regions,  and  is  one  of  the  most  widely 
studied  problems  in  image  processing.  There  are  three 
main  segmentation  categories:  fully  automatic  methods, 
semi-automatic  methods,  and  (almost)  completely  manual 
ones.  The  framework  here  proposed  falls  in  the  semi¬ 
automatic  category.  In  particular,  the  segmentation  is  ob¬ 
tained  after  the  user  has  provided  rough  scribbles  labelling 
the  regions  of  interests.  This  type  of  user  intervention  can 
help  to  segment  particularly  difficult  images.  Moreover, 
it  is  often  imperative  to  mark  the  regions  of  interest, 
which  completely  depend  on  the  user  and  the  application. 
For  example,  the  user  might  be  interested  in  separating 
a  selected  object  (foreground)  from  the  rest  of  the  im¬ 
age  (background),  independently  of  how  complicated  this 
background  is. 

Work  supported  by  the  National  Science  Foundation,  the  Office 
of  Naval  Research,  the  National  Geospatial-Intelligence  Agency,  and 
DARPA.  AP  performed  part  of  this  work  while  visiting  ECE  at  the 
University  of  Minnesota.  GS  performed  part  of  this  work  while  on  leave 
at  the  IMA. 


A  number  of  very  inspiring  and  pioneering  user-assisted 
segmentation-type  algorithms  of  the  style  here  presented 
have  been  recently  introduced  in  the  literature.  The  level- 
set  method  proposed  in  [16]  for  cartoon  colorization 
initializes  a  curve  at  the  user-provided  scribble  and  evolves 
it  until  it  finds  boundaries  of  the  region  of  interest.  The 
speed  of  the  moving  front  depends  on  local  features 
and  global  properties  of  the  image.  In  [21],  the  authors 
present  a  segmentation  algorithm  based  on  the  assumption 
that  if  a  pixel  is  a  linear  combination  of  its  neighbors, 
then  its  label  will  be  the  same  linear  combination  of  its 
neighbors’  labels.  In  this  way,  the  user-provided  labels 
(scribbles)  are  propagated.  This  is  an  extension  of  the 
learning  algorithm  developed  in  [18].  The  authors  of  [25] 
propose  user-assisted  segmentation  as  a  particular  example 
of  clustering  with  side  information.  Grady  et  al.,  [8],  have 
also  proposed  a  user-assisted  segmentation  algorithm.  The 
image  is  seen  as  a  graph,  whose  nodes  are  the  pixels  and 
the  edges  join  neighboring  pixels.  Then  they  propose  to 
compute  the  probability  for  a  random  walker  starting  from 
a  unlabelled  pixel  to  reach  the  user  provided  scribbles 
(labels),  and  assign  the  pixel  to  the  label  with  the  highest 
probability.  The  random  walk  is  biased  by  weights  on 
the  edges,  these  being  a  function  of  the  gradient  of  the 
intensity.  Minimum-cut  type  of  energy  algorithms  were 
proposed  in  [3],  [11],  [17].  Although  these  were  par¬ 
ticularly  developed  for  foreground/background  separation 
(see  also  [1],  [2]),  they  could  in  principle  be  extended 
to  multiple  objects,  as  here  addressed  (additional  relation¬ 
ships  between  these  approaches  and  ours  will  be  presented 
throughout  the  text).  Other  interactive  algorithms  are 
not  based  on  scribbles  but  on  the  user  helping  to  trace 
the  boundary  of  the  objects  of  interest,  e.g.,  [6],  [13]. 
When  compared  with  scribbles  based  techniques,  these 
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algorithms  have  often  been  found  not  to  be  as  robust 
and  to  require  more  user  interaction  [17].  Finally,  and 
in  particular  because  we  obtain  a  soft  segmentation,  we 
should  note  that  the  framework  here  introduced  is  also 
related  and  can  be  used  for  matting,  that  is,  soft  separation 
of  foreground  from  background,  e.g.,  [22]. 

The  pioneering  interactive  image  segmentation  ap¬ 
proaches  just  mentioned  are  mostly  based  on  image  gray 
(or  color)  values,  thereby  limiting  their  use  for  example 
for  textured  data.  To  address  this  we  introduce  the  use 
of  an  adaptive  set  of  Gabor-based  features.  User-assisted 
image  segmentation  must  be  fast,  preferably  on  real  time. 
As  detailed  below,  the  core  computational  effort  in  our 
framework  is  linear  on  the  number  of  pixels.  The  graph- 
cut  based  approaches  have  also  reported  very  reasonable 
running  times  (see  also  [4]),  although  not  linear. 

In  order  to  address  the  above  mentioned  key  challenges 
(work  fast  and  for  a  large  class  of  images),  we  present 
an  interactive  image  segmentation  approach  inspired  by 
the  colorization  work  in  [24],  where  the  goal  is  to  add 
color  (or  other  special  effects)  to  a  given  mono-chromatic 
image.  In  this  work,  following  [10],  the  authors  provide  a 
series  of  color  scribbles  on  a  luminance-only  image,  and 
then  use  geodesic  distances  computed  from  the  same  lu¬ 
minance  channel  to  compute  the  probability  for  a  pixel  to 
be  assigned  to  a  particular  scribble.  Being  more  specific, 
let  s  and  t  be  two  pixels  of  the  image  U  and  Cs,t  a  path 
over  the  image  connecting  them.  Let  also  Y  stand  for 
the  provided  luminance  channel.  The  geodesic  distance 
between  s  and  t  is  defined  by: 

d{s,t):=mmf  \S7Y  ■  Cs>t(p)\dp.  (1) 

Cs,t  J 0 

This  distance  can  be  efficiently  computed  in  linear  time 
[23],  and  in  contrast  with  work  such  as  the  one  in  [8]  and 
[15],  is  related  to  solving  a  first  order  Hamilton- Jacobi 
equation  and  not  a  diffusion  or  Poisson  one.  Let  Uc  be 
the  set  of  pixels  labelled  by  the  user,  in  other  words,  the 
user-provided  scribbles,  with  color  indications  in  this  case 
(later  on,  for  segmentation,  these  scribbles  will  correspond 
to  region  labels).  Then,  the  distance  from  a  pixel  t  to  a 
label  li }  i  G  [1,  A^],  is 


di(t) 


min  d(s,£), 

:  label(s)=li 


and  the  probability  for  t  to  be  assigned  to  the  label  li  is 
given  by: 


Pr (t  G  i) 


djify  1 

Yhjelabel  djit)-1 


^ach  label  represents  a  color  or  a  segment. 


This  probability  is  used  to  weight  the  amount  of  color 
the  pixel  t  will  receive  from  the  color  in  the  scribble 
(label)  li ;  see  [24]  for  details.  In  addition  to  its  use  for 
colorization  and  other  special  effects  as  presented  in  [24], 
this  probability  assignment  can  also  be  seen  as  a  first 
step  towards  soft  segmentation,  and  this  is  exploited  and 
extended  in  this  work.  Thanks  to  the  use  of  a  linear  fast 
marching,  [23],  in  order  to  compute  the  geodesic  distance 
in  Equation  (1),  the  core  algorithm  has  linear  complexity 
in  the  number  of  pixels  which  is  the  best  we  can  obtain, 
since  we  have  to  visit  each  pixel  at  least  once. 

Inspired  by  the  ideas  just  described  on  colorization,  in 
this  paper  we  propose  a  semi-automatic  algorithm  for  the 
segmentation  of  natural  images.  We  generalize  the  weights 
for  the  geodesic  distance,  going  beyond  simple  gradi¬ 
ents,  and  thereby  permitting  to  handle  significantly  more 
complicated  data.  We  keep  the  low  computational  cost  of 
the  geodesic  computation.  The  remainder  of  this  paper  is 
organized  as  follows:  In  Section  II  we  present  the  general 
proposed  framework  for  user-assisted  segmentation.  Then, 
in  Section  III  we  detail  how  we  compute  the  weights 
to  be  used  to  replace  the  simple  luminance  gradient  in 
Equation  (1).  Examples  are  provided  all  throughout  the 
paper,  while  introducing  the  key  concepts,  and  additional 
ones  are  presented  in  Section  IV.  Finally,  in  Section  V 
we  conclude  the  paper  and  present  possible  directions  for 
future  research. 

II.  General  framework  description 
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Fig.  1.  Figure  (b)  ( resp .  (d))  shows  the  geodesic  distance  from  each 
pixel  to  the  red  label  for  the  image  (a)  (resp.  ( c )).  Only  luminance 
gradients,  following  Equation  ( 1 ),  are  used  in  this  case.  While  this 
geodesic  computation  is  sufficient  to  segment  the  image  in  Figure  (a),  in 
(d)  we  notice  that  the  geodesic  distance  does  not  contain  enough  relevant 
information  about  the  different  textured  regions.  ( This  is  a  color  figure.) 


3 


As  detailed  above,  following  the  colorization  work,  the 
use  of  fast  geodesic  computations  is  a  very  interesting  way 
to  perform  semi-automatic  segmentation  starting  from 
user  provided  labels.  In  its  original  form,  this  method 
assumes  that  the  gradient  of  the  intensity  (or  color)  is  low 
inside  the  region  of  interest  and  high  at  the  boundaries. 
Although,  there  are  a  lot  of  images  where  this  assumption 
is  reasonable,  it  obviously  fails  for  example  for  images 
containing  textures,  see  Figure  1.  While  preserving  the 
general  idea  of  obtaining  a  soft  segmentation  by  geodesic 
propagation  of  user-provided  labels,  we  would  like  to  use 
different  weights  in  defining  the  distance  in  Equation  (1). 
In  other  words,  we  propose  to  replace  the  VY  term  repre¬ 
senting  the  gradient  of  the  luminance  channel,  by  a  more 
elaborated  weighting  function,  and  then  still  derive  the  soft 
segmentation  via  the  fast  geodesic  computation  following 
[23],  [24].  We  basically  consider  the  image  grid  as  a  graph 
with  pixels  as  nodes  and  edges  connecting  neighboring 
pixels.  In  this  framework,  the  geodesic  distance  can  be 
seen  as  the  cost  of  the  shortest  path  on  this  graph.2 

A  color  image  can  be  described  by  different  combi¬ 
nations  of  channels.  The  most  commons  are  (R,G,B) 
or  (Y,  Cb,Cr)  (luminance/chrominance),  but  we  can  also 
build  many  additional  channels,  for  example,  by  filtering 
the  luminance  Y.  We  then  represent  the  image  as  a  bank 
of  Nc  channels,  (Fi)i= and  use  the  information 
contained  in  this  bank  to  define  the  weights  for  the 
geodesic  definition.3  When  computing  the  weights,  we  can 
restrict  the  used  information  to  the  positions  of  the  user- 
provided  labels  (scribbles).  To  summarize,  the  general 
expression  for  the  weights  Wi  for  each  scribble  (label) 
i  of  the  Ni  provided  by  the  user  is  given  by 

Wi  =  f(Fi, . . .  ,F/vc,f7c,i),  i  =  l,...,Ni 

where  again,  Wi  is  the  weight  function  on  the  graph  for 
the  fast  geodesic  computation  associated  with  the  label  k 
(to  replace  the  luminance  gradient  in  Equation  (1)),  and  as 
before,  Qc  is  the  set  of  pixels  corresponding  to  the  user- 
provided  scribbles.  The  objective  is  to  design  Wi  such  that 
the  weights  are  low  inside  of  the  region  of  interest  labelled 
with  li  and  high  outside.  Then,  we  can  efficiently  compute 
the  weighted  distance  maps  di  for  i  G  {1, . . . ,  TV/},  and 
assign  each  pixel  to  its  closest  label  (or  leave  this  as  a  soft 

2 Note  that  the  use  of  the  linear  complexity  technique  in  [23]  avoids 
the  classical  metrication  errors  of  Dijkstra  and  graph-cuts  algorithms  that 
operate  on  such  graphs,  thereby  providing  more  accurate  results. 

3  The  larger  the  class  of  images  that  we  want  to  address  with  a  single 
algorithm,  the  richer  this  bank  of  channels  needs  to  be,  thereby  increasing 
the  complexity  of  the  proposed  approach.  This  type  of  rich  representation 
is  needed  by  all  techniques  working  with  large  image  classes,  and  thereby 
this  is  a  step  intrinsic  in  all  general  segmentation  algorithms.  To  reduce 
the  complexity,  a  different  sub-set  of  filters  can  be  used  for  each  pre- 
established  image  class. 


clustering).  In  the  next  section,  we  show  how  to  compute 
these  weights  Wi. 

We  should  note  that  both  the  pixel  values  at  the  user- 
provided  scribbles  (see  below),  and  their  actual  position, 
are  explicitly  used  in  the  segmentation.  This  permits  for 
example  to  avoid  wrongly  disconnected  foreground  and 
or  background  segments. 

III.  Design  of  the  weight  functions 

The  weight  design  includes  several  components.  First, 
we  have  to  define  the  set  of  images  Then,  we  have  to 
show  how  to  adaptively  select  the  relevant  sub-set  from 
them,  or  how  to  differently  weight  each  channel  Fit  This  is 
critical,  since  when  selecting  a  large  number  of  channels, 
as  needed  to  address  a  rich  spectrum  of  data,  the  critical 
information  for  a  particular  region  in  a  particular  image 
is  mostly  in  a  few  channels.  If  not  explicitly  addressed, 
this  will  be  obscured  by  any  metric  comparing  the  whole 
set  of  channels,  and  the  different  regions  will  not  be  well 
separated.  In  this  section,  we  first  describe  the  selection 
made  to  create  the  channels  F;.  We  then  consider  the 
particular  case  where  the  goal  is  just  to  segment  two 
different  regions  in  the  image,  and  show  how  to  adaptively 
weight  the  different  channels.  Later  we  extend  this  to  more 
than  two  regions. 


A.  Selecting  the  channels 

In  addition  to  the  luminance  and  chrominance  channels 
Y,  Cb,  Cr,  we  use  a  bank  of  16  Gabor  filters  (4  scales 
and  4  orientations)  on  the  channel  Y  (Nc  =  19  in  our 
experiments  then).  This  type  of  filter  has  been  frequently 
used  in  the  literature  to  deal  with  texture,  e.g.,  [9],  [19]. 
The  basic  two  dimensional  Gabor  function,  g(x,y ),  is  a 
harmonic  modulated  by  a  Gaussian, 


g(x,y) 


(  1  \  1  (x2 

V  2tT CTxCTy  J  °XP  [  2(0-2 


+  2tt  juox 


where  the  different  standard  parameters  control  the  fre¬ 
quency  and  width  of  the  filter.  The  16  impulse  filter 
responses  are  obtained  by  appropriate  rotations  and  di¬ 
latations  of  this  basic  function.  The  idea  is  that,  locally, 
these  filters  express  the  scale  and  orientation  of  a  texture. 
For  details  about  the  mathematical  properties  of  the  Gabor 
functions,  we  refer  the  interested  reader  to  [12].  For  speed¬ 
up  for  example,  we  could  replace  these  filters  by  the 
steerable  pyramid  [20]. 

Since  natural  texture  can  be  complex  and  noisy,  we  reg¬ 
ularize  the  outputs  of  the  Gabor  filters.  First,  we  saturate 
the  too  high  values  by  a  non-linear  transformation.  Then, 
with  an  averaging  operator,  we  smooth  the  variations. 
Therefore,  if  Gi  is  the  response  of  a  Gabor  filter  applied 


4 


Fig.  2.  Expression  of  the  probability  for  the  pixel  x  to  be  in  the  region 
1  with  respect  to  p\  and  pl2.  In  this  case  x  is  much  more  likely  to  be  in 
region  2  than  in  region  1.  (This  is  a  color  figure.) 


(their  computation  will  be  explained  below).  Then,  the 
weight  associated  with  the  geodesic  computation  for  the 
label  1 1  is  given  by 

W%  =  Wl|2  =  1  —  Pl\2- 

W2  is  similarly  obtained;  see  Figure  3. 6 

As  in  [3],  we  could  re-estimate  the  probability  functions 
as  the  algorithm  progresses.  This  has  the  advantage  of 
creating  richer  representatives,  at  the  cost  of  additional 
computations  and  the  risk  of  including  wrongly  assigned 
pixels  in  the  estimation. 


to  Y,  ft  the  domain  of  the  image,  and  Qx  an  N  x  N 
window  around  the  pixel  x ,  we  consider  the  channel  Fi 
given  by: 

fee  Si: 

where  a  and  N  are  parameters  experimentally  set  to 
0.25  and  5  respectively,  and  cr(-)  stands  for  the  standard 
deviation. 

We  have  just  defined  the  bank  of  channels  Fi  that 
represent  the  image.  We  now  show  how  to  adaptively 
weight  them  to  define  the  global  weight  to  be  used  in 
the  geodesic  computation. 


B.  Segmenting  two  uniform  regions 

Let  us  begin  by  the  segmentation  of  two  uniforms 
regions.4  Let  Qi  and  D2  be  the  set  of  pre-labelled  pixels 
for  the  user-provided  labels  li  and  1 2  respectively.  On 
each  one  of  the  Nc  channels,  we  first  approximate  the 
probability  density  function  (PDF),  with  the  samples  on 
Qi  and  D2,  by  a  Gaussian  (see  also  [3],  [17]).5  We  then 
compute  the  likelihood  for  a  pixel  x  to  be  assigned  to  the 
label  1 1  based  on  the  channel  Fp. 


PI ,2 (z)  := 


Pl(Fi(x)) 

p\(Fi(x))  +  p\(Fi(x)y 


where  p is  the  PDF  of  Qj  on  Fp  see  Figure  2.  Similarly, 
we  compute  PLV 

Then,  the  probability  for  a  pixel  x  to  be  assigned  to  l\ 
is  given  by 


Nc 

Pi\2(x)  ■■=  Pr  {%  G  h)  =  f2wlP[^{x),  (2) 

i=  1 

where  Wi  are  weights  reflecting  the  ability  of  the  channel 
i  G  Nc  to  discriminate  between  the  two  regions  of  interest 


4Here,  we  use  the  word  uniform,  in  the  sense  that  the  regions  can  be 
discriminated  by  the  set  of  computed  channels. 

5  Although  other  fitting  functions  might  be  more  appropriate,  we  found 
this  sufficient  for  the  very  good  results  here  reported. 


C.  Weighting  of  the  channels 

We  have  chosen  to  consider  a  number  of  channels  be¬ 
cause  it  permits  to  characterize  a  wide  range  of  images.  On 
the  other  hand,  for  one  precise  image,  there  are  often  very 
few  channels  which  are  relevant  for  the  discrimination, 
and  using  the  others  will  just  mislead  and  hide  the  useful 
information.7  We  need  then  to  find  the  relevant  channels, 
relying  on  the  user  provided  scribbles/labels.8 

To  compute  the  relevance  of  an  individual  channel  for 
a  given  image/region,  we  assume  that  the  PDFs  of  the 


6  In  fact,  we  normalize  each  Wi  by  dividing  by  its  standard  deviation 
in  order  to  make  them  comparable. 

7 As  an  example,  consider  the  case  where  there  are  Nc  »  1  and 
Nc  —  1  channels  are  identical  for  both  regions,  but  one  is  very  different. 
Using  all  Nc  channels  will  in  general  lead  to  consider  both  regions  the 
same. 

8  We  assume  that  the  user  is  not  an  adversary,  and  if  he/she  marked 
scribbles  in  different  regions  is  because  the  data  around  the  scribbles  is 
different,  and  is  useful  to  perform  the  segmentation. 


Fig.  3.  Two  examples  of  segmentation  into  two  uniform  regions,  showing 
the  importance  of  adaptive  weights,  (a)  The  user-scribbled  image,  (b) 
The  segmented  image  using  all  equal  weights,  note  the  significant  errors 
in  the  segmentation,  (c)  The  automatically  computed  weights  for  each 
channel.The  horizontal  axis  indicates  the  different  Nc  =  19  channels, 
and  the  vertical  their  corresponding  weights.  While  the  top  image  mostly 
uses  a  chrominance  channel,  the  bottom  one  strongly  uses  three  of  the 
Gabor  channels  obtained  by  filtering  the  luminance,  (d)  The  segmented 
image,  with  automatically  computed  adaptive  weights.  (This  is  a  color 
figure.) 
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regions  of  interest  are  represented  by  the  PDFs  obtained 
using  the  information  in  the  scribbles  Qi  and  With 
this  in  mind,  we  evaluate  the  probability  for  a  random 
point  s  in  the  image  to  be  assigned  to  the  wrong  label, 
as  a  function  of  the  previously  computed  PDF’s  on  the 
scribbles  (p\  and  p2): 

Pi  =  Pr(s  G  1)  Pr (s  — >  2|s  G  1))  + 

Pr(s  G  2)  Pr(s  — >  l|s  G  2)), 


=  *  /i 

J  {x- 

2  Lx:f 


{x:p[(x)>pi(x)} 


p\{x)dx  + 


2  1  p2(x)dx. 

I{x:pi(x)>p\(x)} 


Then,  we  deduce  that 


min(p\  (x),p2(x))dx. 


This  quantity  has  been  shown  by  Dunn  and  Higgin,  [7], 
to  be  a  good  criteria  for  channel  selection.  From  it,  we 
deduce  directly  the  weights  for  each  one  of  the  channels 
(to  be  used  in  Equation  (2)): 


Vi  =  1 . . .  Nc  :  wi  = 


(Pip 


sr^Nc 

2^k= l 


( Pk r1' 


Figure  3  shows  that  depending  on  the  type  of  image 
and  the  regions  we  want  to  segment,  the  weights  are 
concentrated  on  different  channels,  either  the  Gabor  filters 
channels,  or  the  luminance  or  chrominance  channels. 
This  is  automatically  computed  with  the  technique  just 
described.  The  figure  also  shows  the  critical  importance 
of  adapting  the  weights  to  the  data. 

We  should  mention  that  in  the  segmentation  results 
presented  in  this  paper,  for  simple  visualization,  we 
hard  threshold  the  soft  segmentation  obtained  from  the 
framework  just  described.  Recall  that  every  pixel  is  as¬ 
signed  a  probability  of  belonging  to  each  one  of  the 
regions  represented  by  the  user-provided  scribbles.  Such 
probabilities  can  be  regularized,  e.g.,  [14],  before  hard 
assignment.  This  will  for  example  regularize  the  contours, 
see  Figure  4.  Such  regularization  needs  to  be  performed 
only  around  regions  of  border-line  decisions,  thereby  not 
adding  significant  computational  cost.  In  order  to  simplify 
the  presentation  and  to  concentrate  on  the  novel  contri¬ 
butions,  for  the  rest  of  this  paper,  we  visualize  only  the 
results  of  hard  assignment  and  without  any  regularization. 


D.  Multiple  uniform  regions 

In  the  previous  section,  based  on  the  user-provided 
scribbles,  we  compare  the  properties  of  two  regions  in 


order  to  discriminate  them.  If  we  want  to  segment  an 
image  in  more  than  two  uniform  regions,  we  might  not 
be  able  to  find  one  particular  channel  which  discriminates 
well  between  one  region  and  all  the  others.  Therefore,  we 
have  to  first  compute  an  optimal  weight  function  for  each 
pair  of  regions  and  then  combine  them  to  build  a  global 
weight  function  for  the  region  (to  use  in  the  geodesic 
computation),  in  such  a  way  that  it  is  low  inside  the  region 
and  high  everywhere  else. 

Let  {Zi, . . . ,  Z }  be  the  set  of  labels  and  Wi\j  the 
weight  function  for  the  label  k  when  competing  only  with 
lj.9  We  want  the  global  weight  function  of  li  to  be  very 
low  inside  its  region  and  very  high  outside,  and  then  we 
can  define  Wz  as 


Wi  =  J2wi\J-  (3) 

Figure  5  graphically  shows  how  this  method  builds  a 
good  weight  function.  An  example  is  presented  in  Figure 
6. 

E.  Non  uniform  regions 

In  many  pictures,  objects  and  background  are  not 
uniform  (composed  by  many  objects)  and  the  PDFs  for 
each  label  can  not  be  modelled  by  a  simple  Gaussian  as 
we  previously  did.  We  could  of  course  use  other  models, 
such  as  mixture  of  Gaussians  (see  for  example  [17]). 
Continuing  with  our  philosophy  of  letting  the  user  help, 

9Wz\j  is  computed  as  in  the  previous  section,  as  if  there  were  only 
two  labels,  lz  and  lj 


Fig.  4.  Regularization  effects  on  the  probability  distribution,  top  row, 
followed  by  its  effects  on  the  region  boundary,  bottom  row.  Original 
images  on  the  left  and  regularized  ones  on  the  right.  Although  here  the 
whole  probability  is  regularized  for  illustration  purposes,  only  the  region 
of  border-line  decision  needs  to  be  processed.  ( This  is  a  color  figure.) 
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Fig.  5.  Principle  for  the  design  of  a  weight  function  in  the  configuration 
of  three  to  be  segmented  regions  A,  B  and  C.  (This  is  a  color  figure.) 


Fig.  6.  Segmentation  example  with  more  than  2  regions.  (This  is  a 
color  figure.) 

instead  of  computationally  complicating  the  algorithm,  we 
opt  for  a  different  approach.  Although,  for  example,  the 
background  is  not  composed  of  a  single  region,  often  we 
can  easily  distinguish  several  uniform  areas  in  the  object. 
The  choice  we  made  here  is  to  let  the  user  decide  about 
the  uniform  sub-regions  and  to  scribble  the  region  using 
several  components  (see  Figure  7).  Then,  we  consider 
each  component  as  an  independent  labels  (we  call  them 
sub-labels),  and  we  are  back  to  the  previous  problem  of 
multiple  uniform  regions.  The  only  difference  is  that  we 
don’t  need  to  make  two  components  of  the  same  label 
compete  against  each  other,  we  don’t  care  to  discriminate 
between  them.  Let  be  the  component  j  of  the  label  Z*. 
Then  W-  ,  its  weight  function,  is  given  by: 

W/ =£!>,;  (4) 

l 

After  segmentation,  we  merge  the  regions  assigned  to 
components  coming  from  the  same  label. 

IV.  Additional  examples 

We  now  present  additional  examples  of  the  proposed 
framework  for  interactive  natural  image  segmentation. 
First,  in  Figure  8  we  present  two  very  different  and 
diverse  images,  showing  the  generality  of  our  proposed 
framework.  Figure  9  exemplifies  the  robustness  of  the  al¬ 
gorithm  with  respect  to  the  positions  of  the  user-provided 


Fig.  7.  Example  of  a  segmentation  with  non  uniform  regions.  Green 
and  red  scribbles  do  not  compete  among  themselves,  only  against  each 
other.  (This  is  a  color  figure.) 

scribbles.  Since  the  segmentation  is  based  on  geodesic 
distances,  we  can  explicitly  apply  the  triangle  inequality 
to  study  the  algorithm  robustness.  Thereby,  the  error  in  the 
probability  assignment  of  a  given  pixel  is  upper-bounded 
by  the  geodesic  distance  between  the  user-placed  scribbles 
in  the  two  different  scenarios  (assuming  the  PDFs  for  both 
scribbles  are  the  same  since  they  are  in  the  same  region, 
if  not,  this  can  be  easily  included  in  the  bound).  If  in  both 
cases  the  scribbles  are  placed  inside  the  same  region,  as 
expected  from  a  non-adversary  user,  this  distance  is  small 
(ideally  zero),  and  as  such,  the  error  is  small.  Finally,  in 
Figure  10  we  first  simulate  the  use  of  our  framework  in  a 
real  interactive  application,  where  the  user  progressively 
adds  scribbles  to  achieve  the  desired  segmentation  result 
(see  also  [3],  [17]).  Finally,  we  show  the  use  of  this  result 
for  image  composition.  A  simply  cut-and-paste  has  been 
used,  with  no  blending.  For  a  real  application,  simple 
blending  needs  to  be  added,  see  for  example  [5],  [17], 
[22].  For  this,  the  natural  soft  segmentation  here  obtained 
is  very  useful. 


Fig.  8.  Additional  segmentation  examples.  In  the  image  with  the  cat,  the 
background  is  decomposed  into  two  sub-labels.  (This  is  a  color  figure.) 


V.  Concluding  remarks 

In  this  paper,  we  have  proposed  an  interactive  algorithm 
for  soft  image  segmentation.  The  proposed  technique 
is  adaptable  to  a  wide  range  of  images  thanks  to  the 
automatic  weighting  of  the  different  channels  involved 
in  the  segmentation.  Based  on  the  fast  computation  of 
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Fig.  9.  (a)  and  (b)  are  two  segmentation  results  of  the  same  image 

using  different  user  scribbling.  It  shows  the  robustness  of  the  algorithm, 
it  is  not  necessary  to  carefully  scribble  in  order  to  obtain  very  good 
results.  (This  is  a  color  figure.) 


Fig.  10.  Progressive  segmentation,  (a)  The  user  starts  with  a  few 
minor  scribbles,  obtaining  only  a  partial  desired  segmentation,  (b)  The 
user  adds  scribbles,  further  improving  the  segmentation,  (c)  The  user 
completes  the  segmentation  by  marking  under  the  arms,  (d)  Example  of 
a  typical  application  of  this  type  of  foreground/background  segmentation: 
image  composition.  Note  that  simple  cut-and-paste  has  been  used.  ( This 
is  a  color  figure.) 


geodesic  curves,  the  core  algorithm  is  linear  in  time  and 
can  be  used  for  interactive  image  labelling. 

There  are  several  directions  of  research  to  pursue 
with  the  user-oriented  segmentation  framework  here  in¬ 
troduced.  We  still  want  as  little  as  possible  user  work, 
and  thereby  helping  the  user  to  place  the  scribbles  will  be 
very  helpful.  For  example,  a  simple  edge  detector  can  hint 
for  good  scribble  locations.  In  addition,  for  segmenting 
rich  images  into  just  foreground  and  background,  models 
that  remain  computationally  simple  but  allow  the  user  to 
provide  just  one  scribble  for  the  whole  foreground  and 


just  one  for  the  whole  background,  will  be  very  helpful. 
In  particular,  and  assuming  that  the  foreground  object  is 
completely  inside  the  image,  it  is  interesting  to  study  the 
use  of  the  image  borders  as  the  background  scribbles.  We 
are  also  extending  this  work  to  video,  and  results  on  this 
will  be  reported  elsewhere. 
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